Introduction

ShinyBaseball is an R package containing several Shiny apps to illustrate Statcast and Retrosheet baseball data. The package can be used to understand how pitch outcomes vary by pitch type and location for a specific pitcher or a specific hitter. One can see how the location of pitches depend on the pitch type and the count. Also, by visualizing the location of in-play events, one can see the hot and cold regions for specific hitters. Statcast data from the 2019 season is included with the package.

This document provides an overview with snapshots of the Shiny functions available in ShinyBaseball version 0.5.3.

Installation

This package depends on the following packages that should be installed first.

shiny, ggplot2, dplyr, stringr, tidyr, lubridate, ggrepel

To install the ShinyBaseball package, use the install_github() function from the remotes package:

library(remotes)
install_github("bayesball/ShinyBaseball")

Learning Shiny Apps

There are several apps demonstrating some interactive graphics features of Shiny.

PointBrush()

This app illustrates scatterplot brushing. One can select X and Y variables to graph. One selects a region of interest and the app will display the names, x and y variables for the points in the region.

PointBrush()

PointClick() and PointHover()

These apps illustrating clicking and hovering capabilities of Shiny graphics. Again one selects variables to graph. By clicking or hovering near a point, the app will display the name and X and Y variables for that point.

PointClick()

FourMeasures() - Brushing Illustration

This is useful for exploring relationships between four variables in the FanGraphs batting leaderboard. One selects X1, Y1, X2, Y2 variables to graph. One sees two scatterplots. One can brush either scatterplot - the corresponding points in the other scatterplot are colored red. Also the app displays the names and variables corresponding to the points in the brushed region.

FourMeasures()

PitchOutcome() - Visualizing Pitch Outcomes

This app is helpful for visualizing pitch outcomes for any pitcher or batter of interest.

We indicate by use of the Player Type button if we want to look at pitching or batting data. Then we enter in the Player Name and the Pitch Type of interest.

Here we indicate that we wish to look at a Pitcher, enter in Jacob DeGrom’s name, and select “FF” from the Pitch Type pallet.

We see the locations of All of Jacob DeGrom’s four-seamers where the color of the point corresponds to the Type variable (ball, strike, or inplay).

PitchOutcome()

By selecting “Called” from the Pitches to Display pallet, we see the locations of all Called pitches where the color corresponds to called ball or called strike.

By selecting “Swing” from the Pitches to Display pallet, we see the locations of all swung pitches – the color of the point corresponds to the outcome (foul, inplay or miss). One can brush over this scatterplot – the app displays the number of swung pitches and miss rate for the points in the brushed region.

By selecting “In-Play” from the Pitches to Display pallet, we see the locations of all pitches put in-play – the color of the point corresponds to the outcome (hit or out). One can brush over this scatterplot – the app displays the number of pitches in-play and hit rate for the points in the brushed region.

PitchTypeCount() - Pitch Locations by Type and Count

This app is helpful for comparing pitch locations across pitch types and counts.

We enter in the Pitcher Name (here Jacob DeGrom). By selecting “FF” and “SL” from the Pitch Type pallet and “0-0” from the Count pallet, we can compare locations of four-seamers and sliders on a 0-0 count.

PitchTypeCount()

By selecting “FF” and “SL” from the Pitch Type pallet and “2-0”, “1-1” and “0-2” from the Count pallet, we can compare locations of four-seamers and sliders on these three “2 pitch” counts.

BrushingZone() - Visualizing In-Play Outcomes

This app is a general function for plotting and brushing different measures on balls put in-play over the zone.

A live version of BrushingZone() can be found on the RStudio Shiny Server:

https://bayesball.shinyapps.io/BrushingZone/

We begin by entering in a batter’s name – here we enter Bryce Harper.

We see the locations of all pitches put in play where the color of the point corresponds to the launch speed.

BrushingZone()

If one clicks on an individual point, you will see the launch speed, launch angle and expected batting average for that ball put into play.

If one brushes over this plot, the app displays number of balls in play (BIP), the number of hits (H) and home runs (HR), and the average values of launch speed, hit rate, home run rate and expected BA for points in the brushed region.

If one selects Home Run, the points are colored by the outcome (home run or not).

This app can also be used to show locations of hits or expected batting average over the zone.

SprayChart() - Visualizing Locations of In-Play Events

This app shows the spatial locations of all balls put in in play for a particular hitter.

NOTE: The graphs are constructed so that the “pull” direction is always the left-side of the display. This will make it easier to compare hitters of different sides.

You enter in the name of a hitter – here we enter Mike Trout. If the Batted Ball Type is selected “All”, one sees the locations of all balls in play where the color corresponds to the batted ball type. A table at the bottom gives the frequency distribution of the batted ball type. Also the subtitle shows the balls-in-play (BIP) hit rate.

SprayChart()

If one selects “Fly ball”, “Ground ball”, “Line drive”, or “Pop up”, one sees the locations of all batted balls of that type. The color of the point corresponds to the batted ball outcome (Hit or Out).

Below we see the locations of Trout’s flyballs. Note that his hit rate on flyballs is 0.348.

SprayCompare() - Compute In-Play Locations for Two Batters

Using this app, one can compare the locations of batted balls for two hitters. One enters in the names of two batters – here we are comparing Mike Trout and George Springer. With Batted Ball Type selected as “All”, one sees the locations of all batted balls where the color corresponds to the batted ball type. At the bottom, a table of the frequencies of batted balls types is displayed for both batters.

SprayCompare()

If one selects “Fly ball”, “Ground ball”, “Line drive”, or “Pop up”, one sees the locations of all batted balls of that type for both batters.

PitcherFourSeam() - Visualizing Rates of Four-Seam Fastballs Over the Zone for One Pitcher

This app shows values of different rate statistics for four-seam fastballs computed over regions of the strike zone for a specific pitcher.

One inputs the name of a pitcher, specific Statcast seasons to include, and the type of rate desired. There are five types of rates:

  • location – the percentage of four-seamers that fall in each region of the zone
  • swing – the percentage of four-seamers that are swung at for each region
  • miss – the percentage of four-seams missed on swings for each region
  • hit – the batting average on balls put into play on four-seamers for each region
  • HR – the home run percentage on balls put into play on four-seamers for each region

For example, by choosing the Rates tab, here are the rates of Jacob deGrom’s four-seamers over the zone

If one chooses the Residuals tab, one computes the difference between deGrom’s location rates and the overall location rates for that period. We see that deGrom is more likely to throw greater percentages of four-seamers high in the zone to left-handed hitters.

By choosing the Z Scores tab, we assess the difference in rates by use of a Z statistic. Values larger than 2 in absolute value are meaningful. We see that deGrom indeed throws a greater fraction of four-seamers high to the zone to left-handers since the Z-scores mostly exceed 2.

One can look at other rates. For example, here are the miss rates of deGrom. Right-handed hitters are pretty likely to miss a deGrom fastball high in the zone.

BatterFourSeam() - Visualizing Rates of Four-Seam Fastballs Over the Zone for One Batter

This app shows values of different rate statistics for four-seam fastballs computed over regions of the strike zone for a specific batter. The design of this app is very similar to the same app for a pitcher.

One inputs the name of a batter, specific Statcast seasons to include, and the type of rate desired.

As an example, here is a display of Mike Trout’s in-play batting average on four-seamers thrown by right and left-handed pitchers.

Here is a display of Mike Trout’s in-play home run percentages on four-seamers thrown by right and left-handed pitchers.

RadialChart() - A Radial Chart of Balls in Play for a Pitcher in a Specific Game

This illustrates the use of a Baseball Savant Radial Chart to show the launch angle and exit velocity of balls put into play.

To use this app, one types the name of a starting pitcher and date that he pitched during the 2019 season. All of the possible starting dates for a given pitcher are listed to make it easier to input the date.

Her is a display of the launch variables of balls put into play for Aaron Nola during the game that he pitched on March 28, 2019.

PredictingBattingRates() - Illustrating the Benefits of Multilevel Modeling

This app illustrates the usefulness of a multilevel model in predicting hitting rates.

The inputs are chosen on the left-hand side of the app. One selects a date during the 2019 season. One trains the model using hitting data up to that date, and predicts rates for hitting data after that date. One decides on the type of rate (H, SO or HR), the minimum number of AB for batters in the training dataset, and whether or not you wish to exclude pitchers batting from the dataset.

By selecting the Rates tab, one sees a parallel dotplot display of the observed rates and the predictions using the multilevel model. The bottom of the screen shows the sum of squared errors of the observed rates and the multilevel model predictions.

By selecting the Talents tab, one sees the estimated talent curve for the rates.

PredictingBattingRatesPA() - Illustrating the Benefits of Multilevel Modeling for PA data

This app is very similar to the PredictionBattingRates() function. The only difference is that one is looking at rates per plate appearances instead of per at-bats.

PredictiveMaxOfer() - Predictive Checking of a Coin-Flipping Model

This app illustrates predictive checking of a basic model for hitting.

Assume the individual hit outcomes follow a coin-flipping probability of success p. Assume p has a Beta distribution with shape parameters a and b.

The ofers are the at-bats between successes in the binary hit outcomes. Interested in the predictive distribution of the maximum length of an ofer or the sum of squared ofer lengths among the Bernoulli outcomes.

One selects 90% bounds for the hit probability p. This indirectly selects the shape parameters of the Beta prior. One selects the number of at-bats, the streaky measure of interest, and the observed value of the measure.

One sees a histogram of the streaky measure for 500 simulations of the experiment. If selected, the histogram will also display with a vertical line the observed value and output the tail probability.

PredictiveHotHand() - Predictive Checking of a Markov Switching Model

This app illustrates predictive checking of a streaky measure for a Markov Switching model.

Assume the individual at-bat outcomes are independent Bernoulli with a specific hit probability. For each game, the batter is either in a hot state with hitting probability pH or a cold state with hitting probability pC. The batter moves between the hot and cold states across games by a Markov Chain with staying probability rho. The probabilities pC and pH have independent beta priors.

The ofers are the at-bats between successes in the binary sequence. Interested in the predictive distribution of the maximum length of an ofer or the sum of squared ofer lengths among the Bernoulli outcomes.

One selects beta priors for the two probabilities by specifying limits of 90% bounds, and selects values of the staying probability rho. One selects a 2019 player of interest and the streaky measure to consider.

One sees a histogram of the streaky measure for 500 simulations of the experiment. The histogram also displays with a vertical line the observed value and output the tail probability.

LogitHomeRunRates() - Comparing Home Run Rates for Two Seasons

This app illustrates comparison of two seasons of home run hitting. All of the data is accessed from the author’s Github site.

One decides on the two seasons to compare and the number of groups for categorization of the launch angle and launch speed values. We consider the “home-run friendly” launch angles between 20 and 40 degrees and launch speeds between 95 and 110 mpg.

The range of launch values are divided into subregions. The top graph shows the difference of the logits (season2 minus season1) of the rates of getting batted balls for each subregion. The bottom graph shows the difference in logits of the home run rates for each subregion.

By pressing the Download button, one can download all of the data used to create the two graphs.

HomeRunPaths() - Shiny app to compare home run paths of a selection of home run leaders

This app provides a set of graphs for comparing the home run paths of a selection of sluggers from MLB history. The hitters are the top 30 leaders in career MLB home runs.

One starts by choosing a small set of players from the input pallete. Here we are choosing Hank Aaron, Reggie Jackson and Gary Sheffield.

The Home Run Paths tab displays the total home run count for each player graphed against age in years.

The Fitted Slopes tab displays a scatterplot of the home run totals and the average count of home runs per year for the selected players.

The Residuals from Fit tab displays smoothed residuals Actual HR Total Minus Fitted (Staight Line) HR as a function of age for the selected players. This allows us to see how a player’s home run path deviates from a straight line. Here both Jackson and Sheffield both show a tendency to hit a higher than average rate of home runs in their middle 30’s.

BerksonBA() - Demonstration of the selection-distortion effect using batting averages

This app demonstrates Berkson’s Paradox or the Selection-Distortion Effect in the context of baseball.

On the left, one selects the season of interest, the minimum number of at-bats and the minimum batting average for all of the batters that season. The graph shows a scatterplot of the in-play rate and the BABIP (batting average on balls in play) value. We see a slight negative association between the two variables.

Now by choosing the minimum BA to be 0.270, we are restricting attention to the batters who had at least a .270 batting average. The graph shows the selected points in red and computes the correlation and displays a best-fitting line on the selected data. Interesting, the correlation between the two variables for the selected data has increased (in absolute value) to the value -0.77.

InPlayRates() - Comparing two seasons with respect to in-play hit and home run rates over launch conditions

In this example, I decide to compare the 2019 and 2021 seasons and focus on hit rates.

The PctSeason 1 tab divides the (launch_angle, launch velocity) space into 6 x 5 = 30 rectangles and shows the hit percentage in each region. The PctSeason 2 tab does a similar thing for the 2021 hit rates.

The Difference in Pcts tab shows the change in hit percentages from 2019 to 2021.

The Z-Score tab displays the change in hit percentages expressed as a standardized score. Z values smaller than -2 or larger than 2 are meaningful.

InPlayRatesSpray() - Visualizing in-play hit rates over launch conditions over fielding configurations for the 2021 season

This app shows how in-play hit rates depend on the launch conditions and how different fielding configurations can affect these hit rates.

On the left hand side, one can use the two sliders to select range of values of launch angle and launch speed. In addition, one can choose a particular season (either 2019 or 2021) and whether you wish to look at all fielding data or distinguish different infield fielding configurations or distinguish different outfield configurations.

In the snapshot below, I focus on hard-hit ground balls where the launch angle is between -20 and 10 degrees and the launch speed is between 90 and 110 mph. I select 2019 data and all fielding data.

The display shows the in-play hit rate graphed as a function of the adjusted spray angle (here negative values correspond to the pull side and positive values correspond to the opposite side).

For the second example, I again choose hard-hit ground balls but select Infield. The display displays the in-play hit probability for both the Infield Shift and Standard fielding configurations. We see that the shifting tends to decrease the hit rate for balls hit to the pulled side.

InPlayRatesSpray2() - Comparing 2019 and 2021 season in-play hit rates over launch conditions

This app is useful in comparing in-play hit rates of the 2019 and 2021 seasons.

As in the InPlayRatesSpray() app, one selects ranges of values of launch angle and launch speed of interest. Also one can use the particular hit type (All, singles, doubles, triples and home runs).

In the following snapshot, I focus on hard-hit ground balls (launch angles between -20 and 10 degrees and launch speeds between 90 and 100 mph) and all types of hits. The top display shows the in-play hit rates for the two seasons and the bottom display shows density graphs of adjusted spray angles. The take-away is that in-play hit rates on ground balls has generally decreased on balls hit on the pull side.

In the second snapshot, I focus on balls hit at launch angles between 20 and 40 degrees and launch speeds between 95 and 105 mph, and I focus on home run rates. The takeaway is that home run rates have generally decreased from 2019 to 2021.

### RunsExpectancy() - Displays runs expectancy calculations using Retrosheet data from 2000 to 2019

The runs expectancy matrix is a table that shows the expected number of runs in the remainder of the inning as a function of the number of outs and the runners on base. This app displays the matrix and associated summaries based on data from a particular season.

Here is an illustration of the use of app. One selects 2000 as the season and indicates that we’ll be considering expected runs in the remainder of the inning.

The runs expectancy matrix is displayed on the top. One graphs the values as a function of the Bases Score which is equal to one plus the sum of bases occupied plus 1 if there is more than one runner. I do separate least-squares fit for each value of number of outs. The intercepts and slopes from this fit are displayed on the left hand side.

I run this app using 2021 data and letting the metrix be the probability of scoring 2 or more runs in the remainder of the innint.

StreakyInPlay() - Explores streakiness patterns on estimated BA and wOBA measurements for specific 2021 hitter

One enters the number of a 2021 batter of interest and the measure of interest (either expected BA or expected wOBA). Here we are considering Bryce Harper and using expected BA. In addition, one inputs the width for the moving average.

The Observed tab displays a moving average plot of the hitting measure as a function of the in-play number. A measure of streakiness is the area of the blue region where the horizontal line is the average measure value for that hitter that season.

The Simulated tab illustrates the results of a simulation experiment to measure the significance of the observed BLUE measure. One randomly permutes the observed measurements and computes the moving averages and the BLUE statistic. By repeating this exercise 500 times, one gets a distribution of the BLUE measure under this equal probability model. The tail probability is the probability the random BLUE value is at least as large as the observed value. S tail probabilities indicate that we have observed extreme streakiness relative to this probability model.t

Datasets in Package

All of the data for using these apps is included as part of the ShinyBaseball package.

chadwick

This provides the Statcast ids for all Major League Players.

FF_15_20

This dataset provides Statcast data on four-seam fastballs for the seasons 2015 through 2020.

fg2020batting

This dataset contains stats for the FanGraphs leaders for the 2020 season. This data is used for the FourMeasures(), PointBrush(), PointClick(), PointHover() Shiny apps.

game_info

This dataset contains the game ids for a large number of MLB games.

retro2019

This dataset contains event data for all 191,973 plate appearances for the 2019 season.

sc_ip_19_20

This dataset provides in-play Statcast data for the two seasons 2019 and 2021. This is used in the InPlayRatesSpray2() app.

sc_pitcher_2019

This dataset provides Statcast data for the 732,473 pitches in the 2019 season. This is data is used for the PitchOutcome() and PitchTypeCount() Shiny apps.

sc2019_ip_radial

This dataset contains launch angle, launch speed, estimated ba, and event data for all balls in play for 2019 season.

sc2019_ip

This dataset provides Statcast data for the 125,751 balls put in play for the 2019 season. This data is used for the BrushingZone(), SprayChart() and SprayCompare() Shiny apps.

sc2021_ip

This dataset provides Statcast data for the balls put in play for the 2021 season. This data is ued for the InPlayRatesSpray() Shiny app.

sc2021_ip3

This dataset provides Statcast data for the balls put in play for the 2021 season. This data is used for the StreakyInPlay() Shiny app.

top30homerun

This data from Baseball Reference gives data about each home run hit by the top-30 career hitters in MLB history. This data is ued for the HomeRunPaths() Shiny app.

twentyyears_RE

This data contains runs expectancy calculations for the twenty seasons 2000 through 2019 as function of the runners on base and number of outs. This data is used for the RunsExpectancy() Shiny app.